AD  672928 


CONVERGENCE  CONDITIONS  FOR  NONLINEAR  PROGRAMMING  ALGORITHMS 

By 


W.  I.  Zangwill 


Working  Paper  No.  197 
Revised  March,  1968 

Center  for  Research  in  Management  Science 
University  of  California 
Berkeley 


i 


This  research  was  supported  (in  part)  by  a  grant  from  the  Office  of.  ^ 
Naval  Research  Grant  No.  0NR-NR-O47-O69  to  the  University  of  California,  and 
administered  through  the  Center  for  Research  in  Management  Science.  Reproduc 
tion  in  whole  or  part  is  permitted  for  any  purpose  of  the  United  States 
Government.  #tA/ 


i 

Reproduced  by  the 

CLEARINGHOUSE 
for  Fodcral  Scientific  &  Technical 
Information  Springfield  Va  221-1 


i* 
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W.  I.  Zangwill 

School  of  Business  Administration 
University  of  California 
Berkeley,  California 

Abstract 

Conditions  which  are  necessary  and  sufficient  for  convergence  of  a 
nonlinear  programming  algorithm  are  stated.  It  is  also  shown  that  the 
convergence  conditions  can  be  easily  applied  to  most  programming  algorithms. 
As  examples,  algorithms  by  Arrow,  Hurwicz  and  Uzawa;  Cauchy;  Frank  and 
Wolfe;  and  Newton-Raphson  are  proven  to  converge  by  direct  application  of 
the  convergence  conditions.  Also  the  Topkis-Veinott  convergence  conditions 
for  feasible  direction  algorithms  are  shown  to  be  a  special  case  of  the 
conditions  stated  in  this  paper. 

Background  and  Summary 

Nearly  twenty  years  ago  F.  John  [7],  and  Kuhn  and  Tucker  [10],  in 

brilli.  nt  papers,  discussed  when  a  given  point  was  optimal  for  a  nonlinear 

programming  problem.  Under  certain  assumptions  they  gave  necessary  and 

sufficient  conditions  for  a  point  to  be  optimal.  From  a  practical  concave 

programming  orientation  the  question,  "Is  a  given  point  optimal?"  was  now 

settled.  Moreover,  their  conditions  prompted  exploration  of  a  broader 

problem,  viz.,  given  a  point  which  is  not  optimal,  how  can  an  optimal  point 

be  located.  In  effect,!'.  Johq  andKuhn  and  Tucker  answered  the  static  question 

of  knowing  when  a  given  point  was  optimal,  but  did  not  resolve  the  dynamic 

question  of  how  to  move  from  a  point  which  is  not  optimal  to  an  optimal  one. 
partial 

As/ answers  to  the  latter  question  numerous  nonlinear  programming  algorithms 
have  been  developed.  One  of  the  earliest  and  best  known  of  these,  the 
Simplex  Method  [4],  actually  predates  the  F.  John,  and  Kuhn-Tucker  conditions. 
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Even  a  casual  glance  at  the  literature  reveals  the  plethora  of 
algorithmic  techniques;  each  one  seemingly  different  from  the  next,  each 
having  its  own  advantages  and  disadvantages.  As  is  veil  known,  it  is 
often  extremely  difficult  to  prove  that  an  algorithm  converges.  In  fact, 
only  a  small  percentage  of  all  suggested  procedures  have  ever  been  proven 
to  converge.  And  even  some  which  at  first  were  thought  to  converge  were 
later  found  to  have  incorrect  or  incomplete  proofs.  Furthermore,  each 
algorithm  seemed  to  have  its  own  unique  and  different  proof. 

It  is  the  purpose  of  this  paper  to  explore  the  similarities  among 
algorithms.  Do  the  Simplex  Method,  the  Nevton-Raphson  Method,  and  in  . 
fact,  all  programming  algorithms  have  a  common  essence?  And  if  so, 
do  there  exist  features  which  insure  algorithmic  convergence?  In  an 
important  paper  Topkis  and  Veinott  [12],  have  studied  these  questions  for 
the  class  of  feasible  direction  algorithms.  It  will  be  shown  that  their 
conditions  are  subsumed  by  the  conditions  presented  in  this  paper.  But  in 
addition,  this  paper  will  not  only  give  conditions  that  are  sufficient  to 
insure  convergence,  but  also  will  pose  conditions  that  convergent  algorithms 
necessarily  satisfy. 

The  practical  impact  of  these  conditions  will  be  illustrated  by  using 
them  to  prove  convergence  of  several  well-known  algorithms.  Fbr  example,  we 
correct  an  error  in  Uzawa's  modification  of  the  Arrow-Hurwicz  algorithm.  Further¬ 
more,  in  most  cases  the  convergence  proofs  are  considerably  simpler  than 
the  original  proofs.  Even  when  there  is  no  obvious  simplification  or 
improvement,  the  convergence  conditions  provide  a  unified  and  straightforward 
method  for  proving  convergence. 

The  Algorithm  as  an  Iterative  Procedure 

Consider  the  general  nonlinear  programming  problem,  called  problem  (P). 
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(1)  maximize  f(x) 

(P) 

(2)  subject  to  g^x)  >  0  i=l, 

where  all  functions  are  real- valued  and  x  e  En,  n-dimensional  Euclidean 
space.  It  is  assumed  throughout  the  paper  that  f  is  continuous.  Define 
Fc  En  as  the  set  of  all  x  which  satisfy  (2).  The  set  F  is  the  feasible 
set.  Any  point  x  e  F  that  maximizes  f  is  said  to  be  an  optimal  point 
for  (P).  We  assume  F  /  4>,  where  $  is  the  null  set. 

Our  goal  is  to  analyze  algorithmic  procedures  for  solving  problem 
(P).  For  definiteness  assume  the  algorithms  are  for  digital  computers, 
and  therefore,  the  algorithms  generate  a  discrete  sequence  of  points. 
Furthermore,  the  algorithm  need  not  operate  directly  upon  the  points  x, 
but  say  on  related  points  z.  We  may  thus  view  an  algorithm  as  a  rather 

Ir 

sophisticated  iterative  procedure,  that  given  a  point  z  either  stops  or 

k+1 

generates  a  successor  z  .  For  generality  assume  that  the  points  z  on 

which  the  procedure  operates  need  not  be  in  En.  Merely  require  that  they 

be  defined  on  a  given  metric  space  (V,  p).  Often  the  metric  space  will  in 

fact  be  En  with  the  usual  metric. 

Now  examine  the  iterative  operation  itself.  Given  a  point  z  the 

k+1 

procedure  yields  a  point  z  .  It  may  be  possible  to  actually  define  a 

k+1  k 

function  A:  V  ->V  such  that  z  =  A(z  ).  The  function  then  defines  the 
iterative  procedure.  Unfortunately  in  many  cases  such  a  function  would  not 
be  well  defined  as  there  may  not  be  a  unique  value  A(z)  for  a  given  z. 

As  an  example,  consider  the  Simplex  Method  and  suppose  the  point  z  has 
just  been  generated.  The  point  z  is  a  basic  feasible  solution  of  the 
constraining  linear  inequalities.  Now  assume  that  the  next  point  y,  also 
a  basic  solution,  is  to  be  generated.  The  point  y,  called  a  successor  point, 
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may  not  be  well  defined  as  there  might  be  a  tie  in  the  choice  of  the  variable 
to  enter  the  basis.  That  is,  there  are  situations  in  which  several  possible  y 
can  conceivably  be  generated  from  z.  Similar  ambiguity  about  the  successor 
to  a  point  z  arises  in  other  algorithms.  We  are  therefore  forced  to 
consider  procedures  that  may  generate  a  y  in  some  set.  The  set  being 
the  set  of  all  possible  successor  points  that  the  iterative  operation  could 
conceivably  generate  from  a  particular  z. 

It  should  also  be  observed  that  the  procedure  might  depend  upon  the 
number  of  iterations  k  already  taken  place.  A  procedure  which  does  not 
depend  upon  k  is  said  to  be  autonomous.  As  autonomous  procedures  are  so 
numerous  an  autonomous  iterative  procedure  will  be  defined  first.  This 
definition  will  then  motivate  the  more  abstract  definition  for  the  general 
iterative  procedure. 

The  Autonomous  Iterative  Procedure 

Consider  a  particular  problem  (P)  and  a  given  metric  space  (V,  p). 
Letting  cP(V)  denote  power  set,  define  a  point  to  set  mapping  A:  V  ->  t'P(V ) . 

Then  the  autonomous  iterative  procedure  operates  as  follows.  Given  z1  e  V 

2  k  k 

assume  z  ,***,z  have  been  generated.  Then,  if  A(z  )  =  ♦  the  procedure 

k  k+1 

stops.  Otherwise  y  c  A(z  )  is  a  possible  value  for  z  and  furthermore 
zk+1  e  A(zk). 

The  more  general  definition  will  now  be  stated. 

The  Iterative  Procedure 

Consider  a  particular  problem  (P)  and  a  given  metric  space  (V,  p). 

k 

For  all  k  >  1  define  a  set  C  V.  For  any  point  z  e  define  a  set 

V*k> c  Vr 

1  2  k 

The  iterative  procedure  is  as  follov/s.  Given  z  e  assume  z  ,  •••jZ 
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have  been  generated.  If  4>  *  Ak(z  ),  the  procedure  stops.  Otherwise  any 
y  e  A^(x  )  is  a  possible  value  of  z  ,  and  furthermore,  z  €  A^(x  ). 

It  should  be  clear  that  the  set  A(z  )  is  the  set  of  all  successors  to 

lc  lc 

z  for  the  autonomous  procedure,  while  for  the  general  procedure  ^(z  ) 

is  that  set. 

Before  continuing  it  is  useful  to  develop  some  notation  for  subsequences. 
The  letter  K,  perhaps  superscripted,  will  denote  an  infi.iite  subsequence 

lc  oo  ^ 

of  the  integers.  Any  subsequence  of  {z  can  be  denoted  {z  for 

an  appropriate  K.  If  the  subsequence  converges  to  a  point  z"  we  write 
zk  -♦  z“  k  e  K.  The  subsequence  {z^*^}  is  simply  the  subsequence  formed 

by  adding  1  to  each  k  c  K.  If  zk+1  -♦  z“+1  k  e  K,  then  z°°+1  is  the 

k+1  k  1 

limit  of  the  subsequence  (z  The  notation  (z  )  ^  where  K  C  K 

will  mean  an  infinite  subsequence  of  subsequence  {z  )K* 


The  Convergence  Conditions 

Our  immediate  goal  is  to  determine  some  conditions  that  are  necessary 
and  sufficient  for  an  iterative  procedure  to  be  a  convergent  algorithm.  But 
first  the  concept  of  convergence  must  be  clarified.  It  is  difficult  to 
write  a  foolproof  definition  of  convergence  other  than  the  tautology  that 
convergence  is  the  property  which  all  convergent  algorithms  have.  To  begin 
with  we  specify  a  set  0  C  V  called  the  solution  set.  Any  point  z  e  fi  is 
called  a  solution  point  or  solution,  and  the  algorithm  will  seek  points  in  Cl. 

The  set  Cl  will  be  defined  by  some  given  property;  the  property  perhaps 
depending  on  the  problem  and  the  algorithm  under  consideration.  Often  Cl 
will  be  the  set  of  optimal  points  to  problem  (P).  However,  many  other 
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properties  can  be  used  to  define  fi.  For  example  fi  could  be  any  one  of 
the  following:  the  set  of  all  points  in  an  e  neighborhood  of  an  optimal 
point,  all  roots  of  an  equation,  all  efficient  or  Pareto  points,  all 
equilibrium  points,  etc.  Nevertheless,  it  is  assumed  that  for  any  given 
problem  and  algorithm  the  set  of  solution  points  fi  has  been  defined. 

Of  course,  fi  may  turn  out  to  be  empty  for  certain  problems. 

Ideally,  we  would  like  an  algorithm  either  to  determine  a  solution  point 
if  one  exists  or  to  indicate  that  a  solution  does  not  exist.  In  addition, 
if  a  solution  does  not  exist,  it  should  tell  us  why  such  a  point  does  not 
exist.  Unfortunately,  such  properties  are  far  too  stringent  to  impose  upon 
any  conceivable  algorithm  implementable  on  any  conceivable  digital  computer. 
We,  therefore,  adopt  a  somewhat  practical  definition  of  an  algorithm  based 
upon  the  properties  of  extant  convergent  algorithms. 

A  convergent  algorithm  is  an  iterative  procedure  with  the  following 
properties: 

a)  If  the  procedure  stops  at  a  point  z,  then  the  algorithm  indicates 

either  that  no  solution  exists  or  that  z  is  a  solution.  Also  if  a  point 

ic  k  k 

z  is  a  solution,  then  either  A^(z  )  =  $  or  ye  A^(z  )  implies  y  is 

a  solution. 

b)  If  the  procedure  generates  an  infinite  sequence  of  points  none  of 
which  are  solutions,  then  if  hll  points  are  not  on  a  compact  set 

no  solution  point  exists  while  if  all  points  are  on  a  compact  set  the 
limit  of  any  convergent  subsequence  is  a  solution  point. 

Sufficient  Conditions 

Certain  conditions  known  as  convergence  conditions  may  now  be  stated 
such  that  if  an  algorithm  satisfies  these  conditions  it  is  a  convergent 
algon  chm.  The  first  condition  is,  roughly  speaking,  a  compactness  condition. 
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that  may  arise  due  to  lack  of  compactness 
A  key  complication  in  nonlin  ar  programming  algorithras/is  that  there  may  be 

no  optimal  point  to  problem  (P).  In  this  case  the  maximum  operation  on  f 

has  to  be  replaced  by  a  supremum  operation.  Condition  I  is  intended  to 

circumvent  such  a  problem. 

Condition  I.  a)  If  for  some  z  and  k  Ak(z)  ■  $,  then  the  algorithm 
indicates  either  that  z  is  a  solution  or  that  no  solution  exists.  Should 
z  be  a  solution,  then  either  Afc(z  )  =  <t>  or  ye  A^(z  )  implies  y  is 
a  solution,  b)  If  the  procedure  generates  an  infinite  sequence  of  points 
none  of  which  are  solution  points,  then  if  a  solution  exists  there  is  a 

Jr 

compact  set  X  such  that  z  e  X  for  all  k. 

This  condition  is  akin  to  similar  assumptions  made  for  nonlinear 
programming  algorithms  [6,  15,  16] .  If  anything  it  is  somewhat  less 
restrictive  than  most  assumptions  of  this  type. 

Condition  II  is  the  crucial  assumption  that  guarantees  convergence. 

k 

Condition  II.  If  z  e  X,  a  compact  set,  for  all  k,  then  there 
exists  a  continuous  function  Z;  X  -♦  E1  such  that: 

Il-a)  Given  any  point  z  then  there  exists  an  such  that  for 
all  1  >  Lk  +  k 

Z(z£)  >  Z(zk). 

Il-b)  Suppose  the  algorithm  generates  an  infinite  sequence  of  points 
none  of  which  are  solutions.  Also  suppose  there  exists  a 

k  00  00 

convergent  subsequence  z  -»  z  k  e  K  such  that  z  is  not 

1  k  *  1 

a  solution.  Then  there  is  a  K  such  that  z  ->  z  k  e  K  and 
Z(z*)  >  Z ( z°° ) . 

The  previously  developed  conditions  will  now  be  proven  sufficient  to 
insure  that  an  iterative  procedure  is  actually  a  convergent  algorithm,  in  that 
it  will  satisfy  the  definition  of  convergence. 
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Theorem  1.  Consider  an  iterative  procedure  on  a  metric  space  (V,  p) 
and  assume  conditions  I  and  II  hold*  Then  the  procedure  is  a  convergent 
algorithm. 

Proof.  By  I-a  we  are  assured  that  if  the  algorithm  stops  at  z  then 
either  z  is  a  solution  or  that  no  solution  point  exists.  Also  if  z  is 
a  solution  any  successor  point  is  also  a  solution. 

Consider  now  that  the  procedure  generates  an  infinite  sequence  of  points 

lc  00 

(z  )k  none  of  which  are  solutions.  If  the  points  are  not  all  on  a  compact 
set,  then  by  I-b  no  solution  point  exists.  If  all  points  are  on  a  compact 
set,  then  any  subsequence  must  contain  a  convergent  subsequence.  It  only 
remains  to  prove  that  the  limit  of  any  convergent  subsequence  must  be  a 
solution. 

k  oo 

It  first  will  be  shown  that  the  sequence  (Z(z  ))kml  itself  has  a  limit. 
Applying  Il-a  there  exists  a  sequence  {z  }  such  that 

(3)  Z(zk  )  -  minimun  {Z(z^)|i  >  k). 

k*. 

Furthermore,  the  sequence  {Z(z  )}  is  monotonic  increasing.  Also  by 

k* 

compactness  of  X  a  convergent  subsequence  {z  may  be  extracted  from 

fc#  U  #  * 

{z  }  such  that  z  -*  z  k  e  K  .  By  monotonicity 

(4)  lira  Z(zk*)  =  lim  Z(zk)  »  Z(z#) 

k*— keK* 

where  the  final  equality  is  by  continuity  of  the  Z. 

Now  consider  any  convergent  subsequence  zk  -»  z'  keK*.  By  continuity 
lirakeK,  Z(zk)  *  Z(z').  Given  k*  and  using  (3)  we  may  select  k’  €  K1  so 
large  that  Z(zk  )  >  Z(zk  ).  Hence 

(5)  Z(z')>Z(z*). 
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But  by  II-Q  given  any  k-  £  K'  there  is  an  such  that  l  >  +  k 

implies 

Z(zk')  <  Z(zi)  . 

Thus  for  k#  large  enough 

Z(k')  <  Z(zk#). 

u*  k 1  * 

Monotonicity  of  {Z(z  )}  implies  Z(z  )  <  Z(z  ).  Thus 

(6)  Z(z' )  <  Z(z*). 

By  (5)  and  (6) 

Z(z*)  -  Z(z'). 

As  this  holds  for  any  limit  point  z’  it  must  be  that 

(7)  lim  Z(zk)  *  Z(z*). 

k-*o» 

l^p  to  this  point  Il-b  has  not  been  employed.  It  will  be  used  now. 

k  op 

Let  z  -i  z  k  c  K.  It  must  be  proven  that  z*  is  a  solution.  Assume 

oo  ^  k.  ^ 

z  is  not  a  solution.  Then  by  Il-b  there  is  a  K  such  that  z  z 
k  e  K1  and 

Z(z*)  >  Z(z“). 

But  by  (7)  this  is  impossible.  Hence  z"  must  be  a  solution. 

Q.  £ .  D  • 

The  next  corollary  will  add  insight  into  the  previous  theorem.  It  is 
useful  for  the  autonomous  case.  But  before  we  state  it,  we  must  define  a 
closed  map. 

A  map  A:  V  -♦  (P(V)  is  said  to  be  closed  at  z“  if 
a)  zk  -»  z“  k  e  K, 

fc)  yk  — » y°  k  £  K 


and 
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c)  yk  e  A(zk;  k  €  K 

imply 

y  e  A(z  ). 

Corollary  1-1.  This  is  the  same  as  Theorem  1  except  that  Condition  II 
is  replaced  by  II '  where: 

Condition  II1.:  If  zk  c  X,  a  compact  set,  for  all  k,  then  there 
exists  a  continuous  function  Z:  X  -♦  such  that 

II ’ -a)  If  z  is  a  solution,  then  c-ithor  the  procedure  terminates  or 
y  e  A(z)  implies 

Z (y)  >  Z(z). 

While  if  z  is  not  a  solution,  y  e  A(z)  implies 


However, 

zk+1  e  A(zk)  k  e  K1  . 

Using  the  definition  of  closedness 

z*  e  A(z“). 

But  z°°  by  assumption  is  not  a  solution.  Therefore  from  II' -a 

Z(z*)  >  Z(z“). 

Q.S.D. 
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For  further  implications  of  this  theorem  see  [16]. 

It  should  be  remarked  that  if  V  is  a  finite  set,  I-a  and  II' -a  insure 

finite  convergence. 

Necessary  Conditions  for  Convergence 

It  will  now  be  shown  that  using  the  previous  definition  of  convergent 

algorithm,  I  and  II  necessarily  follow. 

Theorem  2:  Consider  an  iterative  procedure  on  (V,  p)  which  is  a 

convergent  algorithm.  Let  ft,  the  set  of  all  solution  points,  be  closed. 

Then  conditions  I  and  II  necessarily  follow. 

Proof:  Condition  I-a  holds  easily  as  it  is  a)  of  the  definition  of 

convergence.  Assume  that  an  infinite  sequence  of  points  is  generated  none 

of  which  are  solutions.  If  all  points  are  not  on  a  compact  set 

no  solution  point  exists  so  that  I-b  holds. 

R 

Assume  therefore  that  z  e  X  for  all  k  where  X  is  compact.  It 
must  now  be  shown  that  Il-a  holds. 

(9)  Define  Z(z)  «=  -  inf{p(z,  y);y  €  ft). 

It  is  straightforward  to  show  that  Z  is  a  continuous  function  Z:  X  -+  E*. 
Note  that  Z(z)  *  0  implies  z  e  ft  as  ft  is  closed. 

Consider  the  sequence  (Z(z  It  will  be  established  that 

(10)  lira  Z(zk)  =  0. 

k-tSB 

R  CO 

But  this  must  be  true  for  consider  any  subsequence  z  -»  z  k  e  K.  Then 

lira  Z(zk)  =  Z(zro)  =  0. 
keK 

employing  the  continuity  of  Z  and  the  fact  that  by  hypothesis  any  limit 
point  z°°  is  a  solution. 


« 
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Now  4Mlyze  II -a.  Let  z  be  optimal.  Then  by  a)  of  the  convergence 

If 

definition,  if  there  exists  aye  A^(z  ),  then  y  is  also  optimal.  Hence 
for  any  /  >  k 

Z(zk)  -  Z(zk+1)  «  Z(z£)  -  0. 

Assume  now  z  is  not  optimal.  Then  Z(z  )  <  0.  But  by  (10)  there  exists 
an  such  that  for  all  /  >  +  k,  Z(zk)  <  Z(z^)  <  0. 

Only  Condition  Il-b  remains.  But  Il-b  holds  vacuously  as  any  limit 
point  of  any  convergent  subsequence  must  be  a  solution. 

Q.E.D. 

Application  of  the  Convergence  Conditions 

In  this  section  a  representative  sample  of  the  better  known  algorithms 
will  be  proved  to  converge  using  the  convergence  conditions.  In  several  cases 
the  convergence  proofs  are  considerably  simpler  than  the  original  proofs. 

But  given  any  algorithm  the  convergence  conditions  provide  a  framework  from 
which  to  start  a  convergence  proof.  Presumably  such  a  framework  is  a  better 
place  to  commence  than  starting  each  proof  from  scratch.  Proving  convergence 
has  not  yet  been  reduced  to  filling  in  the  blanks.  But  it  is  hoped  that  the 
convergence  conditions  will  simplify  the  problem. 

Occasionally  in  the  following  algorithms  certain  assumptions  are  made 
which  are  3lightly  stronger  than  the  corresponding  assumptions  made  by  the 
algorithm's  originators.  The  purpose  of  this  is  solely  for  clarity.  The 
proofs  also  hold  with  the  weaker  assumptions. 

Unconstrained  Maxima 

Some  of  the  simplest,  yet  most  useful,  algorithms  seek  the  unconstrained 
maximum  of  f  over  En.  An  important  class  of  these  algorithms,  known  as 


unconstrained  feasible  direction  algorithms,  are  easily  proven  to  satisfy 
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conditions  I  and  II  of  Corollary  1-1  and  hence  are  convergent  algorithms, 
as  is  nov  shown. 

Each  algorithm  in  this  class  possesses  a  continuous  function  b:  En  -♦  En 

k  k+1 

that  serves  as  a  direction.  Briefly  given  a  point  x  ,  the  successor  x 
is  generated  by  maximizing  f  in  the  direction  b(x  ). 

Denoting  Vf(x)  as  the  gradient  of  f  evaluated  at  x,  a  point  x  is 
termed  a  solution  if  Vf(x)b(x)  ■  0.  We  specify  the  map  A  by  x'  €  A(x)  if 
and  only  if  x'  is  an  optimal  solution  to 


(ll)  max{f(x  +  xb(x) )  |t  >  0). 

(junc  on  strained  feasible  direction") 

The^algorithm  operates  as  follows.  If  x  is  a  solution  stop.  Otherwise 

k+1  k  k 

define  the  successor  by  calculating  x  e  A(x  ).  For  simplicity  let  t  satisfy 


(12) 


k+1 


xk  + 


Tkb (xk) . 


To  prove  convergence  the  following  two  assumptions  will  be  needed, 
i)  Either  f  has  no  solution  point  or  the  set  (xjf(x)  >  f(x0)} 
is  compact  for  any  x° 


and 

ii)  If  x  is  not  a  solution,  then  x'  e  A(x)  implies  f (x' )  >  f(x). 

It  will  now  be  shown  that  any  unconstrained  feasible  direction  algorithm 

which  satisfies  i)  and  ii)  also  satisfies  conditions  I  and  II'.  Assumption  i) 

insures  that  I  holds  since  if  no  x'  satisfies  equation  (ll)  then  there  is 

n  k  k 

no  solution  point.  Condition  II' -a  is  verified  by  letting  V  =  E  ,  x  =  z  , 

and  Z(z)  =  f(x),  because  (ll)  insures  that  f(x  )  is  raonotonic.  Also  ii) 

k 

provides  that  if  x  is  not  a  solution 

f(xk+i)  >  f(xk). 

Condition  II' -b  will  now  be  established.  Let  xk  -*  x*  and  kk+^  ->  x°°+^ 

k  c  K.  By  continuity  b(xk)  -+  b(x“),  k  e  K.  Assume  x°°  is  not  a  solution, 

(.by  definition  of  .?oliition^  j.  „ 

then^b(x  )  /  0.  It  then  follows  from  (12)  that  for  some  >  0,  t  -+t  keK. 

Using  (ll)  for  any  t  >  0 

f(xk+1)  =  f(xk+tkb(xk))  >  f(xk+Tb(xk)). 


Taking  limits 


f(x”+1)  =  f  (x°,+TWbi1x<  ) )  >  f  (xcc+xb(x°°) ) . 

As  this  holds  for  any  t  >  0 

f(x°°+1)  -  maxffCx0*  +  t b (x°° ) )  | t  >  0)  . 

Therefore  the  map  A  is  closed  and  condition  II'-b  holds.  The  algorithm 
converges. 

By  applying  the  above  reasoning  two  popular  algorithms  are  seen  to 
converge.  The  first  is  the  Cauchy  [2]  procedure.  This  procedure  assumes 
the  f  is  continuously  differentiable  and  defines  b(x)  =  Vf(x).  The 
second  is  a  modified  Newton-Raphson  algorithm.  For  this  algorithm  f  is 
assumed  to  have  continuous  second  partial  derivatives  and  its  Hessian  matrix 
at  x,  H(x),  is  assumed  negative  definite  for  all  x.  Defining  H  ^(x) 
as  the  inverse  of  the  Hessian,  b(x)  =  -H~1(x)'7f (x).  It  might  also  be  noted 
that  the  above  reasoning  provides  an  alternative  to  Theorem  2  of  Topkis  and 
Veinott  [12]. 

Thd  Frank  and  Wolfe  Algorithm 

The  Frank  and  Wolfe  algorithm  [6]  is  for  problem  (P)  when  all  constraints 
gj^  are  linear  and  f  is  continuously  differentiable.  Assume  the  feasible 
region  F  is  compact.  Given  any  x  e  F  define  A(x)  as  follows.  Solve 
the  linear  programming  problem  where  x  is  fixed  via  the  Simplex  Method 

(13)  raax(Vf(x)w|w  e  F) 

for  an  optimal  w1.  Then  x'  e  A(x)  if  and  only  if  x'  is  an  optimal  solution 
to 

(lU)  max(f(x  +  t(v^-x))|o  <  t  <  1). 

Note  that  for  0  <  t  <  1,  x  +  t(w-x)  e  F.  Therefore  given  x^"  e  F  all 
successor  points  will  also  be  feasible. 
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A  point  x  in  called  n  .solution  if  7f  (x )  (v/^-x )  =  0  where  w^  solves 
(13).  Should  x  be  a  solution  the  procedure  stops  and  A(x)  =  <t>.  The 
algorithm  is  now  specified  and  we  will  prove  its  convergence  using  conditions 
I  and  II. 

Condition  I  holds  immediately  as  F  is  compact.  Letting  V  =  F, 
r  =  x,  and  Z(z)  =  f(z),  equation  (l4)  insures  that  Il-a  holds. 

Now  condition  Il-b  must  be  established.  Let  xk  -♦  x"  and  xk+1  -♦  x°°+1 

OO  k 

for  k  e  K.  Assunu  x  is  not  a  solution.  Define  w  as  the  optimal 
point  to  problem  (13)  when  x  =*  x  .  From  the  theory  of  linear  programming 
and  the  fact  that  f  has  continuous  derivatives,  there  must  exist  a 
C  K  such  that  wk  -♦  w°°  k  c  where  w°°  solves  (13 )  for  x  =  x“. 

Given  any  fixed  t,  0  <  t  <  1 

f(xk+1)  >  f(xk  +  r(wk  -  xk)). 

From  the  continuity 

f(xw+1)  >  max{f(x“  +  x(w°° -x”))|o  <  t  <  1)  >  f(x“), 

where  the  final  inequality  holds  as  x°°  not  optimal  implies 

Vf(x“)  (w°°-x")  >  0. 

; Corollary  1.1 

Hence,  condition  Il-b  holds.  The  algorithm  can  also  be  proved  to  converge  via^/ 
Modifications  of  the  above  reasoning  can  be  used  to  validate  that  many 
similar  algorithms  satisfy  the  convergence  condition.  In  particular,  the 
decomposable  nonlinear  programming  method  of  Zangwill  [l4],  Zoutendijk's 
methods  of  feasible  directions  [IT],  and  the  Convex  Simplex  Method  of 
Zangwill  [15]  fall  into  this  category. 

The  Direction  Function 


Topkis  and  Veinott  [12]  consider  the  concept  of  a  direction  function. 


f 
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This  concept  is  quite  useful  for  proving  that  feasible  direction  algorithms 
converge.  Let  F,  the  feasible  region,  be  compact.  A  direction  b  e  En  is 
called  a  feasible  direction  at  x  e  F  if  for  some  t  >  0,  x  +  tb  e  F,  for 
all  0  <  t  <  t.  If  b  is  feasible  at  x  and,  in  addition,  f(x)  <  f(x+tb) 
for  all  0  <  t  <  t,  then  b  is  also  said  to  be  usable  at  x.  Should  no 
usable  direction  exist  at  x  €  F,  the  point  x  is  termed  a  solution. 

As  f  is  continuous  and  F  is  compact  a  solution  point  clearly  exists. 

we  assume 

For  every  sequence  (x^,x^,x^,  •••,x  , • •• }  in  F  /  there  is  a  direction  function 

lk  k  i  p  k\ 

b  which  assigns  to  (x  ,  }  a  feasible  direction  b  =  b(x  ,x  >•••;*  ) 

k  k 

at  x  .  All  b  are  contained  in  a  compact  set. 

k  qq 

Let  (z  ,  b  )  ->  {z  ,  b  )  k  e  K,  then  the  Topkis-Veinott  conditions 
specify  that 

i)  For  some  t  >  0,  x^  +  Tb^  e  F  for  all  k  c  K  and  all 
0  <  T  <  t. 

ii)  If  b00  is  feasible  but  not  usable  at  x“,  then  x°°  is  a 
solution. 

Also  there  is  a  real-valued  lower  semi -continuous  step  size  function 
f  ’x,  w)  defined  on  the  Cartesian  product  F  (x)  F  such  that  f^x,  x+Tb) 
is  continuous  in  t.  This  function  satisfies  conditions 

iii)  f^(x,  x)  =  f(x)  and  f^(x,  w)  <  f(v)  for  w,  x  e  F, 
and 

iv)  if  b  is  usable  at  x  for  f,  then  b  is  usable  for 
f^(:c,  • )  at  x. 

algori  thraic  1  2  k  k 

The/procedure  is  defined  as  follows.  Given  x  ,x  ,*,*,x  ,  if  x 

k+1 

is  a  solution  the  procedure  stops.  Otherwise  the  point  x  is  generated 
by 

(15)  rV,  xktl 


)  =  max{f^(xL  x*4  +  TbLlx  >0,  x*  +  rb*  e  F}. 
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It  must  now  be  shown  that  any  procedure  which  satisfies  the  above  four 
conditions  also  satisfies  I  and  II.  Let  En  be  the  metric  space  and  define 
V  ss  F  and  =  (x  )  for  all  k.  The  algorithm  is  then  defined  as  follows 

(♦  if  x11  is  a  solution 

yxk)  - 

(xk+1)  otherwise 

Also  let  Z(x)  =  f(x). 

Condition  I  holds  as  the  set  F  is  compact.  Condition  Il-a  holds 
because  equation  (15 )  and  condition  iii)  insure  that  f  is  monotonic. 

lc  qo  ^  1 

Condition  Il-b  now  must  be  established.  Let  x-+x  keK,  x  -»x 

lc  00  00 

k  €  K  and  b  -»  b  k  €  K.  Assume  x  is  not  a  solution.  It  must  be 
proved  that 

iU'*1)  >  f(x"). 

By  i)  and  the  compactness  of  F,  x°°  +  Tb”  e  F  for  all  0  <  T  <  t,  and  thus 
bK  is  feasible.  Since  b“  is  feasible  but  x"  is  not  a  solution,  b°° 
must  be  usable  by  ii).  Then  via  iv)  there  exists  a  0  <  x^  <  t  such  that 

(16)  rV,  x°°  +  A")  >  ia(x“,  xw)  =  f(x“). 

ibecausc^ . 

Furthermore,  ^  xk  -♦  x°°  keK  and  bk  -t  b“  keK 

xk  +  x1bi;  -*  x"  +  x^b*  keK. 

In  addition,  by  i)  as  x^  <  t,  xk  +  x^xk  e  T  for  all  keK. 

Using  (15)  and  iii) 

„/  k+lN  „1,  k  k+lv  .  J./  k  k  Iki 

f(x  )  >  f  (x  ,  x  )  >  f(x  ,  x  +  xl)  ). 
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Exploiting  the  continuity  of  f  and  lower  semi -conti nui  cy  of  f  , 

f(x°°+1)  ■  lim  f(xit+"'')  >  lim  inf  f^(xk,  x^+^) 
keK  "  keK 

>  lim  inf  f^(xk,  xk+  tH11) 

k€K 

>  f^(x°°,  x°°  +  T^b°°). 

Hence  Il-b  holds  for  using  (lo) 

f(x"+1)  >  r(y"). 

The  fact  that  the  Topkis-Veinott  conditions  are  a  special  case  of  the 
convergence  conditions  I  and  II  establishes  that  the  many  algorithms  proved 
by  their  techniques  can  also  be  proved  by  the  convergence  conditions.  One 
class  of  algorithms  subsumed  by  these  conditions  are  the  so-called  cyclic 
coordinate  ascent  methods.  These  methods  optimize  one  coordinate  at  a  time. 
Arrow -Hurwicz-Uzawa  Algorithm 

The  Uzawa  iterative  adaption  [1]  of  the  Arrow-Hurwicz  gradient  method 
considers  the  following  modification  of  (P) 

maximize  f(x) 
subject  to  g(x)  >  0 
x  >  0 

where  f  is  strictly  concave  and  g(x)  =  (g^(x),  •  •  •;gm(x))  is  a  vector 
of  the  constraints  a.  each  of  which  is  assumed  concave.  All  functions  are 
continuously  differentiable  on  E°.  It  is  also  supposed  that  a  vector  x° 
exists  such  that  x^  >  0  and  g^  (x^)  >  0  for  all  i. 

Define  a  Lagrangean  function 


T(x,  u)  =  f(x)  +  ug(x) 
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where  «  is  an  ra  vector.  Consider  a  point  (Y,  u)  assumed  to  exist, 
called  a  saddle  point,  such  that 

(19)  ¥(x>  u)  “  max  =(x,  u)  «*  min  ¥(x,  u). 

xX)  u>0 

By  strict  concavity  there  is  a  unique  x  which  solves  (19).  Let 
U  =  (u|(x,  u)  is  a  saddle  point  of  y(x,  u)).  It  is  easy  to  show  that 
U  is  compact  [1,  p.  155]* 

The  algorithm  is  defined  by  the  difference  equations 
(20a)  xk+1  =  max[0,  xk  +  T¥x(xk,  uk)] 


(20b)  uk+1  =  max[0,  uk  -  T*u(xk,  uk)] 

where  x^  >  0  and  u1  >  0,  t  is  a  positive  scalar  to  be  specified 


subsequently,  and  ¥  =  V  ¥(x  ,  u1')  and  V  =  g(x  )  are  respectively  the 

JV  Jv  U 

partial  derivatives  of  y  with  respect  to  x  and  u. 

Given  e  >  0  a  point  z  =  (x,  u)  is  termed  a  solution  if 

— -  2 

|5T  -  x |  <  e. 

The  algorithm  commences  with  an  initial  point  z1  =  (x1,  u1)  and 

k  k  k 

recursively  generates  points  z  =  (x  ,  u  )  via  equations  (20a)  and  (20b) 
until  a  solution  is  obtained.  The  algorithm  stops  at  a  solution. 

To  specify  t  first  define 


Z(x,  u)  =  min  { |x  -  x|2  +  |u  -  u|2}. 
ueU 


Then 


(x-x)¥  -  (u-u)T 


u 


ifxi2<  i-g2 


mm  O 

■§  <  |x-x|  ,  Z(x,u)  <  J,  u  e  U 


J 


t  =  min 
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where  J  =  Zfc^,  u1).  The  value  of  t  will  be  positive  because  the  numerator 
of  the  function  being  minimized  is  positive  as  the  next  lemma  proves,  also 
the  function  is  continuous  and  the  region  of  minimization  is  compact. 

Lemma  3. 

(x  -  x)fx  -  (u  -  u)¥  >0  if  x  /  x. 

Proof:  By  concavity  of  'i  in  x 

¥(x,  u)  <  '{(x,  u)  +  (x  -  x)?x  . 

Since  =  g, 

f(x,  u)  -  (u  -  u)^  =  ¥(x,  u). 

Therefore 


^(x,  u)  -  ¥(x,  u)  <  (x  -  x)^  -  (u  -  u)¥u.  (1) 

Now  by  definition  of  a  saddle  point,  and  since  f  is  strictly  concave 

*(x,  u)  <  ¥(x,  u)  <  f(x  -  u) 


Therefore  from  (l) 


0  <  W(x,  u)  -  ¥(x,  u)  <  (x  -  u)¥  -  (u  -  u)T  . 

A  U 


Q.E.D. 


We  now  prove  convergence  via  Corollary  1.1.  Condition  II  will  be 

established  first.  If  z  =  (x,  u)  is  a  solution  the  procedure  terminates, 
k  k  k 

Now  suppose  z  a  (x  ,  u  )  is  not  a  solution.  After  some  manipulation  it 
can  be  shown  that  [1,  page  156] 

l  ^"1  —  1 2  I  k+1  — 12  I  k  -i2  I  k  -i2 

x  -  x  +  u  -  u  <  x  -x  +  u  -u 


■  -  T(2[(7-xk)¥x  -  (u-u)Tu]  -  t[ |^x|2  +  |^ul2] ]  (2) 

We  will  validate  II' -a  (reversing  inequalities)  by  proving  that 

„/  k+1  k+ls  ^  „/  k  k. 

Z(x  ,  u  )  <  Z(x  ,  u  ) 


(3) 
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However,  to  prove  (3)  we  see  from  (2)  that  it  is  only  necessary  to  show 

kv„,  / . .  \ ih  i  r  I  »#  I  2  I  iv#  1 2  - 


2[(x-x*)*x  -  ft-u)^]  -  t[|»x|  +  |*J  ]  >  0. 

Employing  the  definition  of  t,  the  left  side  of  (^)  is  larger  than 

/(v-y^W  -  (u-uk}¥  \ 

-  *k)v  ru- u).u]  -^^|2)u)(M2  *  M2) 


(M 


-  E(x  -  xk)Sfx  -  ft  -  uk)Yu] 

>  0 

Ir 

where  the  final  inequality  holds  via  Lemma  3  as  z  is  not  a  solution.  Thus 
II' -a)  holds. 

Condition  II' -b)  holds  immediately  as  the  recursions  (20a)  ana  (20b)  are 
continuous  functions. 

We  observe  that  from  equation  (3)  for  all  k 
Z(xk,  uk)  <  J. 
k  k  k 

Therefore  all  z  =  (x  ,  u  )  generated  are  on  a  compact  set.  Furthermore, 
by  assumption  a  saddle  point  exists,  therefore  condition  I  also  holds.  The 
algorithm  converges. 


21 


Other  Algorithms 

Several  other  algorithms  have  been  proved  by  application  of  the 
convergence  conditions.  In  particular,  loss  function  methods 
such  as  Zangvill's  penalty  function  method  [13]  and  Fiacco  and  McCormick's 
sequential  unconstrained  approach  [5]  have  been  established  using  the 
convergence  conditions.  Also  cutting  plane  methods  [8]  have  been  considered. 
Conclusion 

This  paper  has  presented  an  attempt  to  unify  the  convergence  proofs  of 
nonlinear  programming  algorithms.  Both  necessary  and  sufficient  conditions 
for  convergence  were  discussed.  The  methods  presented  seem  related  to  but 
are  actually  somewhat  different  than  the  differential  equation  stability 
theory  of  Liapunov  [11].  For  the  special  case  in  which  A(x)  is  a  function, 
that  is,  A(xk)  =  xk+i*  and  in  Edition  A(x)  is  continuous,  the  convergence 
conditions  may  be  considered  as  Liapunov  conditions  for  the  nonlinear 
programming  case.  Finally,  it  should  also  be  clear  by  selecting  the  solution 
set  fi  astutely,  algorithms  other  than  the  nonlinear  programming  algorithm 
can  be  considered. 
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